AITopics | transliteration system

Collaborating Authors

transliteration system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data

Micallef, Kurt, Habash, Nizar, Borg, Claudia

arXiv.org Artificial IntelligenceNov-13-2025

Maltese is a unique Semitic language that has evolved under extensive influence from Romance and Germanic languages, particularly Italian and English. Despite its Semitic roots, its orthography is based on the Latin script, creating a gap between it and its closest linguistic relatives in Arabic. In this paper, we explore whether Arabic-language resources can support Maltese natural language processing (NLP) through cross-lingual augmentation techniques. We investigate multiple strategies for aligning Arabic textual data with Maltese, including various transliteration schemes and machine translation (MT) approaches. As part of this, we also introduce novel transliteration systems that better represent Maltese orthography. We evaluate the impact of these augmentations on monolingual and mutlilingual models and demonstrate that Arabic-based augmentation can significantly benefit Maltese NLP tasks.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.findings-emnlp.1177

2509.12853

Country:

North America > Canada (0.28)
Europe > France (0.28)
North America > United States > Minnesota (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

ParsTranslit: Truly Versatile Tajik-Farsi Transliteration

Merchant, Rayyan, Tang, Kevin

arXiv.org Artificial IntelligenceOct-10-2025

As a digraphic language, the Persian language utilizes two written standards: Perso-Arabic in Afghanistan and Iran, and Tajik-Cyrillic in Tajikistan. Despite the significant similarity between the dialects of each country, script differences prevent simple one-to-one mapping, hindering written communication and interaction between Tajikistan and its Persian-speaking ``siblings''. To overcome this, previously-published efforts have investigated machine transliteration models to convert between the two scripts. Unfortunately, most efforts did not use datasets other than those they created, limiting these models to certain domains of text such as archaic poetry or word lists. A truly usable transliteration system must be capable of handling varied domains, meaning that suck models lack the versatility required for real-world usage. The contrast in domain between data also obscures the task's true difficulty. We present a new state-of-the-art sequence-to-sequence model for Tajik-Farsi transliteration trained across all available datasets, and present two datasets of our own. Our results across domains provide clearer understanding of the task, and set comprehensive comparable leading benchmarks. Overall, our model achieves chrF++ and Normalized CER scores of 87.91 and 0.05 from Farsi to Tajik and 92.28 and 0.04 from Tajik to Farsi. Our model, data, and code are available at https://anonymous.4open.science/r/ParsTranslit-FB30/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.0752

Country:

North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)
Asia > Middle East > Iran (0.24)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages

Sumanathilaka, Deshan, Anuradha, Isuri, Weerasinghe, Ruvan, Micallef, Nicholas, Hough, Julian

arXiv.org Artificial IntelligenceJan-15-2025

The paper overviews the shared task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages. It focuses on the reverse transliteration of low-resourced languages in the Indo-Aryan family to their native scripts. Typing Romanized Indo-Aryan languages using ad-hoc transliterals and achieving accurate native scripts are complex and often inaccurate processes with the current keyboard systems. This task aims to introduce and evaluate a real-time reverse transliterator that converts Romanized Indo-Aryan languages to their native scripts, improving the typing experience for users. Out of 11 registered teams, four teams participated in the final evaluation phase with transliteration models for Sinhala, Hindi and Malayalam. These proposed solutions not only solve the issue of ad-hoc transliteration but also empower low-resource language usability in the digital arena.

dataset, indo-aryan language, transliteration, (15 more...)

arXiv.org Artificial Intelligence

2501.05816

Country:

Asia > Sri Lanka > Western Province > Colombo > Colombo (0.04)
Asia > Pakistan (0.04)
Asia > Nepal (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches

De Mel, Yomal, Wickramasinghe, Kasun, de Silva, Nisansa, Ranathunga, Surangika

arXiv.org Artificial IntelligenceDec-31-2024

Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their own writing script. In this study, our focus is on Romanized Sinhala transliteration. We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task. For the latter, we propose a Transformer-based Encode-Decoder solution. We witnessed that the Transformer-based method could grab many ad-hoc patterns within the Romanized scripts compared to the rule-based method. The code base associated with this paper is available on GitHub - https://github.com/kasunw22/Sinhala-Transliterator/

large language model, machine learning, transliteration, (15 more...)

arXiv.org Artificial Intelligence

2501.00529

Country:

Asia (1.00)
Europe > Estonia (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Transliteration between Sindhi Scripts from Devanagari to Perso-Arabic

Rathore, Shivani Singh, Nathani, Bharti, Joshi, Nisheeth, Katyayan, Pragya, Dadlani, Chander Prakash

arXiv.org Artificial IntelligenceMay-12-2023

In this paper, we have shown a script conversion (transliteration) technique that converts Sindhi text in the Devanagari script to the Perso-Arabic script. We showed this by incorporating a hybrid approach where some part of the text is converted using a rule base and in case an ambiguity arises then a probabilistic model is used to resolve the same. Using this approach, the system achieved an overall accuracy of 99.64%.

machine learning, natural language, transliteration, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISCON57294.2023.10112098

2305.07365

Country: Asia > India > Rajasthan (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Alkhatib

AAAI ConferencesFeb-8-2022, 11:20:10 GMT

The task of transliteration of named entities from one language into another is complicated and considered as one of the challenging tasks in machine translation (MT). To build a well performed transliteration system, we apply well-established techniques based on Hybrid Deep Learning. The system based on convolutional neural network (CNN) followed by Bi-LSTM and CRF. The proposed hybrid mechanism is examined on ANERCorp and Kalimat corpus. The results show that the neural machine translation approach can be employed to build efficient machine transliteration systems achieving state-of-the-art results for Arabic – English language.

alkhatib, transliteration system

AAAI Conferences

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Transliteration Can Help Alexa Find Data Across Language Barriers : Alexa Blogs

#artificialintelligenceAug-26-2018, 16:14:02 GMT

As Alexa-enabled devices continue to expand into new countries, finding information across languages that use different scripts becomes a more pressing challenge. For example, a Japanese music catalogue may contain names written in English or the various scripts used in Japanese -- Kanji, Katakana, or Hiragana. When an Alexa customer, from anywhere in the world, asks for a certain song, album, or artist, we could have a mismatch between Alexa's transcription of the request and the script used in the corresponding catalogue. To address this problem, we developed a machine-learned multilingual named-entity transliteration system. Named-entity transliteration is the process of converting a name from one language script to another.

machine learning, natural language, transliteration, (16 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Add feedback